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DETAILED ACTION 

Specification 

1 . The title of the invention is not descriptive. A new title is required that is clearly 
indicative of the invention to which the claims are directed. 

The following title is suggested: Image and Video Indexing and Searching using 
Detected Objects 

Drawings 

2. The drawings are objected to under 37 CFR 1 .83(a). The drawings must show 
every feature of the invention specified in the claims. Therefore, the object tracking 
must be shown or the feature(s) canceled from the claim(s). No new matter should be 
entered. 

Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in 
reply to the Office action to avoid abandonment of the application. Any amended 
replacement drawing sheet should include all of the figures appearing on the immediate 
prior version of the sheet, even if only one figure is being amended. The figure or figure 
number of an amended drawing should not be labeled as "amended." If a drawing figure 
is to be canceled, the appropriate figure must be removed from the replacement sheet, 
and where necessary, the remaining figures must be renumbered and appropriate 
changes made to the brief description of the several views of the drawings for 
consistency. Additional replacement sheets may be necessary to show the renumbering 
of the remaining figures. Each drawing sheet submitted after the filing date of an 
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application must be labeled in the top margin as either "Replacement Sheet" or "New 
Sheet" pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, 
the applicant will be notified and informed of any required corrective action in the next 
Office action. The objection to the drawings will not be held in abeyance. 



Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351 (a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

2. Claims 1-5, 7-8, and 11-16 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Jain et al (US 5,983,237). 

Consider claim 1, Jain discloses a method of identifying a user-specified object 
contained in one or more images of a plurality of images (see column 7, lines 10-23 
where Jain describes an image searching method involving a user query), the 
method comprising defining regions of objects in said images (see column 10, lines 6- 
14 describing each image containing visual senses also called visual objects), 
computing a vector in respect of each of said regions based on the appearance of the 
respective region, each said vector comprising a descriptor (see column 10, line 16 
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describing a feature vector describing a visual object), vector quantizing said 
descriptors into clusters (see column 10, lines 43-47 describing grouping the 
feature vectors to cover a region), storing said clusters as an index with the images in 
which they occur (see column 2, line 57 to column 3, line 12 describing insertion of 
images and associated feature vectors into a database), defining regions of said 
user-specified object (see column 10, lines 24-26 describing computing features of 
a small region on an image), computing a vector in respect of each of said regions 
based on the appearance of said regions, each said vector comprising a descriptor, and 
vector quantizing said descriptors into said clusters (see column 11, lines 19-23 
describing obtaining feature vectors for a query image), searching said index and 
identifying which of said plurality of images contains said clusters so as to return the 
images containing said user-defined object (see column 11, lines 24-31 describing 
the comparison of feature vectors and returning a ranking of image matches from 
a database). 

Consider claim 16, Jain discloses a method of identifying a user-specified object 
contained in one or more image frames of a moving picture (see column 7, lines 10-23 
where Jain describes an image searching method involving a user query, and 
column 9, line 64 to column 10, line 3 describing the extension of this search to 
video), the method comprising associating a plurality of different 'visual aspects' with 
each of a plurality of respective objects in said moving picture (see column 10, lines 6- 
27 describing the process of indexing images based on the features detected 
within that image), retrieving the 'visual aspects' associated with said user-specified 
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object (see column 11, lines 19-23 describing submitting the user query and 
retrieving synonymous feature vectors), and matching said visual aspects' 
associated with said user-specified object with objects in said frames of said moving 
picture so as to identify instances of said user-specified object in said frames (see 
column 11, lines 23-31 describing matching the synonym feature vectors with 
images in a database and sending back a ranking of the hits). 

Consider claim 2, Jain discloses comparing the clusters relating to the objects 
contained in the images identified as containing an occurrence of said user-specified 
object with the one or more clusters relating to said user-specified object, and ranking 
said images identified as containing an occurrence of said user- specified object 
according to the similarity of the one or more clusters associated therewith to the cluster 
associated with said user-specified object (see column 11, lines 24-31 describing the 
comparison of feature vectors and returning a ranking of image matches from a 
database). 

Consider claim 3, Jain discloses that at least two types of viewpoint covariant 
regions are defined in respect of each of said images (see column 11, lines 19-23 
describing submitting the user query and retrieving equivalent query synonyms). 

Consider claim 4, Jain discloses that a descriptor is computed in respect of each 
type of viewpoint covariant region (see column 11, lines 19-23 describing that the 
equivalent query synonyms are represented by feature vectors). 
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Consider claim 5, Jain discloses that one or more separate clusters are formed in 
respect of each type of viewpoint covariant region (see column 10, lines 43-47 
describing grouping the feature vectors to cover a region). 

Consider claim 7, Jain discloses that said user-specified object is specified as a 
sub-part of an image (see column 17, lines 47-53 and figure 6, describing feature 
spaces in an image as being disjoint regions). 

Consider claim 8, Jain discloses that identification of said user-specified object is 
performed by first vector quantizing the descriptor vectors in a sub-part of an image to 
precomputed cluster centers (see column 10, lines 43-54 describing grouping the 
feature vectors to cover a region or using one large feature to cover the full 
feature region). 

Consider claim 1 1 , Jain discloses that each image or portion thereof is 
represented by one or more cluster frequencies (see column 8, line 66 to column 9, 
line 3 describing the association of a set of feature vectors to describe the visual 
appearance of an image). 

Consider claim 12, Jain discloses that said cluster frequency is weighted (see 
column 9, lines 3-10 describing weights being associated with the sets of feature 
vectors). 

Consider claim 13, Jain discloses that a predetermined proportion of most 
frequently occurring clusters in said plurality of images are omitted from or suppressed 
in such index (see column 15, line 55 to column 16, line 14 describing a diversity 
maximization process that limits results by using match quotas). 
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Consider claim 14, Jain discloses that said index comprises an inverted file 
structure having an entry for each cluster which stores all occurrences of the same 
cluster in all of said plurality of images and possibly more precomputed information 
about each cluster occurrence such as for example its spatial neighbours in an image 
(see column 17, line 45 to column 18, line 23 describing indexing feature vector 
groups by storing images together that are represented by the same feature 
vector cluster). 

Consider claim 15, Jain discloses including the step of ranking said images using 
local image spatial coherence or global relationships of said descriptor vectors (see 
column 10, lines 21-25 describing features such as orientation, shape, and 
turning angle histograms and column 11, lines 32-43 describing ranking the 
image hits based on weightings of each feature vector). 



Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 9-1 0 and 1 7-21 are rejected under 35 U.S.C. 1 03(a) as being 
unpatentable over Jain as applied to claims 1 and 16 above, and further in view of 



Crabtree et al (US 6,263,088). 
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Consider claim 9, Jain discloses the method according to claim 1 , where images 
are grouped using any of the techniques known in Computer Vision and Patter 
Recognition research at the time of invention (see column 22, lines 30-35). Jain does 
not explicitly disclose that the regions defined in each image are tracked through 
contiguous images and unstable regions are rejected. Crabtree discloses a 
correspondence graph manager which creates video tracks from regions of motion (see 
column 5, lines 20-22) and a region corresponder which outputs a score of the 
correspondence between regions of a previous frame, when this score is too low it tells 
the correspondence graph manager to stop tracking the current object (column 24, 
lines 24-32). 

It would have been obvious to one skilled in the art at the time the invention was 
made to modify the invention of Jain, and modify the region detection to include tracking 
of images, as taught by Crabtree, thus using a grouping technique known in the art at 
the time of invention, as discussed by Jain (see column 22, lines 30-35). 

Consider claim 1 0, Crabtree discloses that an estimate of a descriptor for a track 
is computed from the descriptors throughout the track (see column 23, lines 42-50 
describing computing a mean vector from the set of data points whose cluster 
was matched from one frame to another). 

Consider claim 17, Jain discloses the method according to claim 16, wherein the 
'visual aspects' associated with an object are obtained using any of the techniques 
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known in Computer Vision and Patter Recognition research at the time of invention (see 
column 22, lines 30-35). Jain does not explicitly disclose using one or more 
sequences or shots of a moving picture in which said object occurs. Crabtree discloses 
a method for breaking a video into tracks representing the motion of detected objects 
(see column 2, line 65 to column 3, line 19). 

It would have been obvious to one skilled in the art at the time the invention was 
made to modify the invention of Jain, and modify the feature detection to operate on 
video clips of specific objects, as taught by Crabtree, thus using a grouping technique 
known in the art at the time of invention, as discussed by Jain (see column 22, lines 
30-35). 

Consider claim 18, Jain discloses the method according to claim 16. Jain does 
not explicitly disclose tracking said object through a plurality of image frames in a 
sequence. Crabtree discloses a method tracking movement of objects through a video 
scene (see column 2, line 65 to column 3, line 1). 

It would have been obvious to one skilled in the art at the time the invention was 
made to modify the invention of Jain, and modify the feature detection to operate on 
video clips of specific objects that have been tracked through a video sequence, as 
taught by Crabtree, thus allowing complex objects to be tracked in an inexpensive 
manner, as discussed by Crabtree (see column 2, lines 58-63). 
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Consider claim 19, Crabtree discloses defining affine invariant regions of objects 
in said image frames and tracking one or more regions through a plurality of image 
frames in a sequence (see column 18, line 59 to column 19, line 7 describing 
features used for image tracking, including moment invariant features). 

Consider claim 20, Crabtree discloses that in the event that a track terminates in 
an image frame of a sequence, propagating the track to either following or preceding 
image frames in the sequence, so as to create a substantially continuous track 
throughout the image frames in the sequence (see column 5, lines 20-22 describing a 
correspondence graph manager which creates video tracks from regions of 
motion and column 24, lines 24-32 describing a region corresponder which 
outputs a score of the correspondence between regions of a previous frame, 
when this score is too low it tells the correspondence graph manager to stop 
tracking the current object). 

Consider claim 21 , Crabtree discloses tracked regions being grouped into objects 
according to their common motion using constraints arising from rigid or semi- rigid 
object motion (see column 25, lines 57-67 describing a split/merge resolver which 
uses motion features of regions to calculate confidence values between frames, 
thus determining whether an object is the same as the previous frame). 

5. Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Jain as 
applied to claim 3 above, and further in view of Schaffalitzky et al, "Multi-view matching 
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for unordered image sets" and Matas et al, "Robust Wide Baseline Stereo from 
Maximally Stable Extremal Regions". 

Consider claim 6, Jain discloses the method according to claim 3, where images 
are grouped using any of the techniques known in Computer Vision and Patter 
Recognition research at the time of invention (see column 22, lines 30-35). Jain does 
not explicitly describe using at least two types of viewpoint covariant regions including 
Shape Adapted and Maximally Stable regions respectively. Schaffalitzky discloses a 
shape adapted method to extract viewpoint covariant regions (see page 4 describing 
invariant neighbourhoods). Matas discloses a maximally stable method to extract 
viewpoint covariant regions (see page 386 describing maximally stable extremal 
regions). 

It would have been obvious to one skilled in the art at the time the invention was 
made to modify the invention of Jain, and modify the detection of viewpoint invariant 
regions to use shape adapte and maximally stable regions, as taught by Schaffalitzky 
and Matas, thus using grouping techniques known in the art at the time of invention, as 
discussed by Jain (see column 22, lines 30-35). 

Conclusion 

6. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

a. Jain et al (US 5,893,095) discloses a similarity engine for content basted 

retrieval of images. 
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b. Lim (US 6,574,378) discloses a method for indexing and retrieving images 
using visual keywords. 

Contact Information 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Eric J. Mohr whose telephone number is (571) 270- 
5140. The examiner can normally be reached on 7:30am-5pm M-Th, 7:30am-4pm 
Alternate Fridays. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Nick Corsaro can be reached on (571 ) 272-7876. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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